Bootstrapping Semantic Lexicons for Technical Domains
نویسندگان
چکیده
We address the task of bootstrapping a semantic lexicon from a list of seed terms and a large corpus. By restricting to a small subset of semantically strong patterns, i.e., coordinations, we improve results significantly. We show that the restriction to coordinations has several additional benefits, such as improved extraction of multiword expressions, and the possibility to scale up previous efforts.
منابع مشابه
A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts
This paper describes a bootstrapping algorithm called Basilisk that learns highquality semantic lexicons for multiple categories. Basilisk begins with an unannotated corpus and seed words for each semantic category, which are then bootstrapped to learn new words for each category. Basilisk hypothesizes the semantic class of a word based on collective information over a large body of extraction ...
متن کاملRelation Guided Bootstrapping of Semantic Lexicons
State-of-the-art bootstrapping systems rely on expert-crafted semantic constraints such as negative categories to reduce semantic drift. Unfortunately, their use introduces a substantial amount of supervised knowledge. We present the Relation Guided Bootstrapping (RGB) algorithm, which simultaneously extracts lexicons and open relationships to guide lexicon growth and reduce semantic drift. Thi...
متن کاملLearning Semantic Lexicons using Graph Mutual Reinforcement based Bootstrapping
Bootstrapping has been received a amount of attentions in many fields and achieved good results. While semantic lexicons also have been proved to be useful for many natural language processing tasks. This paper presents an approach to learn semantic lexicons using a new bootstrapping method which is based on Graph Mutual Reinforcement. The approach uses only unlabeled data and a few of seed wor...
متن کاملExploiting Strong Syntactic Heuristics and Co-Training to Learn Semantic Lexicons
We present a bootstrapping method that uses strong syntactic heuristics to learn semantic lexicons. The three sources of information are appositives, compound nouns, and ISA clauses. We apply heuristics to these syntactic structures, embed them in a bootstrapping architecture, and combine them with co-training. Results on WSJ articles and a pharmaceutical corpus show that this method obtains hi...
متن کاملDevelopment of the Multilingual Semantic Annotation System
This paper reports on our research to generate multilingual semantic lexical resources and develop multilingual semantic annotation software, which assigns each word in running text to a semantic category based on a lexical semantic classification scheme. Such tools have an important role in developing intelligent multilingual NLP, text mining and ICT systems. In this work, we aim to extend an ...
متن کامل